Model Selection

Image-to-text conversion

# Image-to-text conversion

Google.gemma 3 4b It Qat Int4 Unquantized GGUF

A quantized version of the image-to-text model based on Gemma 3 4B, aiming to make knowledge accessible to the public

Ibm Granite.granite Vision 3.2 2b GGUF

Granite Vision 3.2 2B is a vision-language model developed by IBM, focusing on image-to-text tasks.

Thai Handwriting Llm

A LoRA-adapted vision-language model based on Llama-3.2-11B-Vision-Instruct, capable of transcribing Thai handwritten text from images.

Safetensors Other

Donut Finetune Rvl Cdip

Document classification model based on the Donut framework, trained on a small-scale RVL-CDIP dataset

Transformers English

Pix2struct Screen2words Large

A large-scale vision-language model based on the Pix2Struct architecture, fine-tuned specifically for generating UI interface function descriptions

Transformers Supports Multiple Languages

Image Captioning Portuguese

This model converts images into Portuguese descriptions, trained on ViT and GPT2 architectures.

Image-to-Text Other

adalbertojunior

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase